Integrated omics analysis of coronary artery calcifications and myocardial infarction: the Framingham Heart Study

Gene function can be described using various measures. We integrated association studies of three types of omics data to provide insights into the pathophysiology of subclinical coronary disease and myocardial infarction (MI). Using multivariable regression models, we associated: (1) single nucleotide polymorphism, (2) DNA methylation, and (3) gene expression with coronary artery calcification (CAC) scores and MI. Among 3106 participants of the Framingham Heart Study, 65 (2.1%) had prevalent MI and 60 (1.9%) had incident MI, median CAC value was 67.8 [IQR 10.8, 274.9], and 1403 (45.2%) had CAC scores > 0 (prevalent CAC). Prevalent CAC was associated with AHRR (linked to smoking) and EXOC3 (affecting platelet function and promoting hemostasis). CAC score was associated with VWA1 (extracellular matrix protein associated with cartilage structure in endomysium). For prevalent MI we identified FYTTD1 (down-regulated in familial hypercholesterolemia) and PINK1 (linked to cardiac tissue homeostasis and ischemia–reperfusion injury). Incident MI was associated with IRX3 (enhancing browning of white adipose tissue) and STXBP3 (controlling trafficking of glucose transporter type 4 to plasma). Using an integrative trans-omics approach, we identified both putatively novel and known candidate genes associated with CAC and MI. Replication of findings is warranted.


Omics measures
Blood samples were collected during 1998-2008 for the Offspring cohort (examination cycle 7 and 8), and 2002-2005 for the Offspring Spouse and Third generation cohort (1st examination cycle).
Affymetrix 550 k Array (Affymetrix, Santa Clara, CA) was used for profiling of genetic variants, which were then imputed to the 1000 Genomes Project by MaCH (v 1.0.15) 9 and only variants with an imputation quality greater than 0.3 were retained.DNA methylation was available for participants in the Offspring cohort from 8th examination cycle and Third generation cohort from 2nd examination cycle.The profiles for DNA methylation were measured from whole blood derived DNA using the Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA) 10 .Rigorous quality control was performed and only high quality CpG sites were kept.The gene expression profiling was derived from isolated RNAs from fasting peripheral whole blood on the Affymetrix Human Exon 1.0st Array (Affymetrix, Santa Clara, CA) for the Offspring cohort and Third Generation cohort at the same exams at which DNA methylation was assessed.More details of the methods for the profiling of omics data is available in previous publications 6,11 .

Outcomes
To investigate associations between gene function and coronary disease development and progression, respectively, a total of four outcomes were analyzed: CAC as a dichotomous variable (presence or absence of CAC), CAC as a continuous variable (excluding those with a CAC score of 0), prevalent MI, and incident MI.Separate analyses were undertaken for prevalent and incident MI to avoid incorrect handling of time (since pooling prevalent and incident cases may lead to spurious associations in opposite directions).The two measures of CAC were chosen to allow for investigations of associations between gene function and (1) development of calcification (CAC as a dichotomous variable) and ( 2) the extent of the calcification among subjects with CAC (CAC as a continuous variable).MI was diagnosed by ECG, enzymes and history, or autopsy evidence.CAC prevalence was estimated from coronary tomography (CT) scans undertaken for the Offspring in 1998-2001 (7th examination cycle) and Third generations in 2002-2005 (1st examination cycle), and the extent of CAC was quantified by the Agatston score 12 .The Agatston score is computed by multiplying the lesion area with a weighted attenuation score (based on the maximal attenuation score within the lesion), where a calcified lesion is an area including at least 3 connected pixels with CT attenuation > 130 Hounsfield units 13 .

Statistics
The analysis was performed in three steps.In the first step each omics measure (GWAS, DNA methylation, and gene expression) was regressed to each of the four outcomes.Given the familial relatedness among FHS participants, generalized estimating equations were used to test the association of omics measures with the presence of CAC and prevalent MI.Similarly, linear mixed models were used to assess the association between omics measures and CAC values.In addition, Cox proportional hazards models clustering on pedigrees were used to assess the association between omics measures and incident MI.All models were adjusted for age, sex, weight, height, and technical covariates.
In the second step, the associations of each of the four outcomes with each type of omic measures (GWAS, DNA methylation, and gene expression) were summarized at the gene level.For the genetic associations, the most significant genetic variant within each gene region was used to represent the overall association of the gene with the outcomes.Similarly, the most significant CpG site within each gene region was used to represent the overall association of methylation profile with the outcomes.For the gene expression, the most significant transcript was used to represent the association of each gene with the outcomes.Finally, we used robust rank aggregation to integrate the top 5% association from the three different omic data types.It tested how much better the gene was positioned in the ranked list than what would be expected by chance, which is formalized by randomly shuffling of the ranked list 5 .A trans-omic score was calculated to represent the significance of each gene across the different omic data types.A full description of the statistical test is available elsewhere 5 .The test results from the top 10 genes with the lowest trans-omic scores are shown for each outcome.Additionally, we highlighted genes that were identified among the top genes with the trans-omic scores for more than one of the studied outcomes.All trans-omic scores and the p-values from the individual analyses are available in Tables S1, S2, S3, and S4 in the Supplemental Data, which may be used by other researchers as a reference.

Integration of different outcomes
We finally compared the list of the top 100 genes with the lowest trans-omic scores for each outcome to each other and identified genes matches (Fig. 1).In total, 13 genes were found to associate with more than one of the outcomes.These genes included PDCD6-AHRR, AHRR, EXOC3 and SLC9A3-AS1 (Table 5).We further examined the enrichment of top genes in biological pathways, and found that the top enriched pathways include Aryl-Hydrocarbon Receptor Repressor.Can bind to nuclear factor-kappa B (NFKB) and may be immune modulating 16 .It has previously been associated with smoking 17 .AHRR DNA methylation has also been associated with carotid intima-media thickness 18 .AHRR expression is increased in atherosclerotic lesions of mice and may in conjunction with other genes (such as TCF21) activate an inflammatory response in the coronary artery smooth vessels 19 Chr. 5 (p15.

Discussion
In this analysis, we integrated the association results from three omics data (GWAS, DNA methylation, and gene expression) to identify molecular signatures related to CAC and MI.We also provided a full list of genes from the analyses in the online material allowing other researchers to access all our results.It is important to point out that the present study did not have any formal cutoff to claim statistical significance and the results from this and prior studies are therefore not directly comparable.In this context, those top loci did not reach the conventional genome-wide significance cutoff.For many of the top ranked genetic loci, there are other levels of evidence suggesting that they may be involved in the pathogenesis of coronary disease, as discussed in the next sections, which also aligns with pathophysiological pathways of atherosclerosis identified in previous studies 60 .Among the top 10 genes associated with CAC levels (excluding those with no CAC), there were 4 genes located in proximity to each other at chromosome 5 (PDCD6-AHRR, AHRR, EXOC3, and SLC9A3-AS1).Aryl-Hydrocarbon Receptor Repressor (AHRR), which can bind to nuclear factor-kappa B (NFKB) and may be immune modulating 16 , has previously been reported to be upregulated among smokers.Further, variation in DNA methylation in the AHRR gene has previously been associated with carotid plaque scores, even after adjustment for smoking status 17 .The AhR pathway, which can be activated by smoking, can increase the expression of inflammatory markers in macrophages and is involved in the buildup of lipids in macrophages and formation of plaque 17,61 .EXOC3 is important for controlling granule secretion and glycoprotein receptor trafficking in platelets, and in EXOC3 conditional knockout mice arterial thrombosis was found to be accelerated along with improved homeostasis 20 .The sodium proton exchanger subtype 3 (SLC9A3) is highly expressed in the small intestine and colon, where it absorbs salt in the gastrointestinal tract and affects the extracellular fluid www.nature.com/scientificreports/volume and blood pressure.SLC9A3 is a potential drug target for hypertension by reducing salt uptake in the gut 21 .These genes were also among the top genes across all outcomes.
As expected from what we know of vascular biology, several of the top genes are known to be involved in inflammation, macrophage signaling, and endothelial function.Neither of these genes have, however, been firmly identified by GWAS previously.For instance, HAPLN2 binds hyaluronan, which is expressed in relation to inflammatory signaling and appears to be involved in the progression of atherosclerotic plaques, was among the top genes for the CAC presence 14,62 .The tumor necrosis factor (TNF) receptor type 1 (prevalent CAC), A-kinase anchor protein 8 (an enzyme that bind to protein kinase A, prevalent MI), and Cyclin G Associated Kinase (a transcriptional target of p53 tumor suppressor gene, prevalent CAC), appear all to be downstream targets of the TNF-alpha signaling pathways 39 .ST14 (Matriptase, also known as PRSS14/Epithin) represent another potentially interesting pathway that may relate to macrophage migration into the arterial walls.It has previously been reported to be involved in the transendothelial migration of activated macrophages 59 .
Moreover, several genes have previously been implicated in lipid metabolism, including ALP1, which is involved in intestinal fat absorption 30 .ALP1 deficiency is linked to the metabolic syndrome and ischemic heart disease in humans 31 .CLEC4F, identified for continuous CAC, may be directly involved in cholesteryl ester transfer protein (CETP) production 36 and has been proposed as a target for CVD 63 .The BRD4 is part of the bromodomain and extra-terminal (BET) protein family 44 and has been suggested to be of importance for integration of the endothelium.Inhibition of the BET reader protein has been suggested as a possible strategy in the prevention of adverse vascular remodeling 64 .

Strengths and limitations
Strengths of the present study included multiple omics measures in a well-phenotyped cohort, and the familial relatedness in FHS, which could increase the likelihood of finding genetic mechanisms underlying MI given coronary disease clusters in families.We further integrated evidence from multi omics data to reduce false positive findings.Our study revealed multiple pathways possibly involved in the development of coronary disease.Our analyses should, however, be considered as hypothesis generating only.Although several pathways have been implicated in the pathogenesis of atherosclerosis and MI risk before, replication in independent cohorts would have further strengthened the plausibility of our findings.The use of whole blood to measure gene expression is a feasible, yet an imprecise measure of the actual gene activity within coronary arteries and comprise a limitation.Finally, this study includes only a moderate sample size with a very limited number of events and consists of a predominantly White population of European descent.Despite its limited sample size, the deep phenotyping, multi-omics measures, and multigenerational structure are unique features of the cohort, justifying the present set of analyses.

Conclusion
Using a trans-omic approach we integrated data from GWAS, DNA methylation, and gene expression to identify potential biological mechanisms in the development of CAC and MI.We identified several candidate genes for MI and CAC, of which many have been implicated in prior studies.Further research is still needed to confirm our findings and identify potential pathways for the prevention and treatment of coronary artery disease.Table 5. Name, location, and annotation/function of genes identified in top 100 for at least two outcomes.The table includes 13 genes that were identified in the top 100 of lowest rank-scores for at least two of the outcomes considered.All the top 100 genes for each outcome can be found in Fig. 1, where the 13 genes identified for more than one outcome are highlighted.

Figure 1 .
Figure 1.Top 100 genes for each outcome with the lowest trans-omic scores.Genes identified in top 100 for more than one outcome was highlighted.

Table 1 .
Characteristics of study participants.All comorbidities are measured at baseline, i.e. the time of the Framingham Heart Study examination.BMI body mass index, MI myocardial infarction, CAC coronary artery calcification.*In our study population, 2589 (83.4%) had data on both GWAS and gene expression, 1842 (59.3%) had data on both GWAS and DNA methylation, 1690 (54.4%) had data on both gene expression and DNA methylation, and 1614 (52.0%) had all data on all three omics.

Table 4 .
Trans-omic scores and p-values for genomics, epigenomics and transcriptomics association studies of prevalent and incident MI.MI myocardial infarction.